9 research outputs found
The Unbalanced Gromov Wasserstein Distance: Conic Formulation and Relaxation
Comparing metric measure spaces (i.e. a metric space endowed with
aprobability distribution) is at the heart of many machine learning problems.
The most popular distance between such metric measure spaces is
theGromov-Wasserstein (GW) distance, which is the solution of a quadratic
assignment problem. The GW distance is however limited to the comparison of
metric measure spaces endowed with a probability distribution.To alleviate this
issue, we introduce two Unbalanced Gromov-Wasserstein formulations: a distance
and a more tractable upper-bounding relaxation.They both allow the comparison
of metric spaces equipped with arbitrary positive measures up to isometries.
The first formulation is a positive and definite divergence based on a
relaxation of the mass conservation constraint using a novel type of
quadratically-homogeneous divergence. This divergence works hand in hand with
the entropic regularization approach which is popular to solve large scale
optimal transport problems. We show that the underlying non-convex optimization
problem can be efficiently tackled using a highly parallelizable and
GPU-friendly iterative scheme. The second formulation is a distance between
mm-spaces up to isometries based on a conic lifting. Lastly, we provide
numerical experiments onsynthetic examples and domain adaptation data with a
Positive-Unlabeled learning task to highlight the salient features of the
unbalanced divergence and its potential applications in ML
Sinkhorn Divergences for Unbalanced Optimal Transport
Optimal transport induces the Earth Mover's (Wasserstein) distance between
probability distributions, a geometric divergence that is relevant to a wide
range of problems. Over the last decade, two relaxations of optimal transport
have been studied in depth: unbalanced transport, which is robust to the
presence of outliers and can be used when distributions don't have the same
total mass; entropy-regularized transport, which is robust to sampling noise
and lends itself to fast computations using the Sinkhorn algorithm. This paper
combines both lines of work to put robust optimal transport on solid ground.
Our main contribution is a generalization of the Sinkhorn algorithm to
unbalanced transport: our method alternates between the standard Sinkhorn
updates and the pointwise application of a contractive function. This implies
that entropic transport solvers on grid images, point clouds and sampled
distributions can all be modified easily to support unbalanced transport, with
a proof of linear convergence that holds in all settings. We then show how to
use this method to define pseudo-distances on the full space of positive
measures that satisfy key geometric axioms: (unbalanced) Sinkhorn divergences
are differentiable, positive, definite, convex, statistically robust and avoid
any "entropic bias" towards a shrinkage of the measures' supports
Interpolating between Optimal Transport and MMD using Sinkhorn Divergences
Comparing probability distributions is a fundamental problem in data sciences. Simple norms and divergences such as the total variation and the relative entropy only compare densities in a point-wise manner and fail to capture the geometric nature of the problem. In sharp contrast, Maximum Mean Discrepancies (MMD) and Optimal Transport distances (OT) are two classes of distances between measures that take into account the geometry of the underlying space and metrize the convergence in law. This paper studies the Sinkhorn divergences, a family of geometric divergences that interpolates between MMD and OT. Relying on a new notion of geometric entropy, we provide theoretical guarantees for these divergences: positivity, convexity and metrization of the convergence in law. On the practical side, we detail a numerical scheme that enables the large scale application of these divergences for machine learning: on the GPU, gradients of the Sinkhorn loss can be computed for batches of a million samples
Faster Unbalanced Optimal Transport: Translation invariant Sinkhorn and 1-D Frank-Wolfe
International audienceUnbalanced optimal transport (UOT) extends optimal transport (OT) to take into account mass variations to compare distributions. This is crucial to make OT successful in ML applications, making it robust to data normalization and outliers. The baseline algorithm is Sinkhorn, but its convergence speed might be significantly slower for UOT than for OT. In this work, we identify the cause for this deficiency, namely the lack of a global normalization of the iterates, which equivalently corresponds to a translation of the dual OT potentials. Our first contribution leverages this idea to develop a provably accelerated Sinkhorn algorithm (coined 'translation invariant Sinkhorn') for UOT, bridging the computational gap with OT. Our second contribution focusses on 1-D UOT and proposes a Frank-Wolfe solver applied to this translation invariant formulation. The linear oracle of each steps amounts to solving a 1-D OT problems, resulting in a linear time complexity per iteration. Our last contribution extends this method to the computation of UOT barycenter of 1-D measures. Numerical simulations showcase the convergence speed improvement brought by these three approaches
Unbalanced minibatch Optimal Transport; applications to Domain Adaptation
International audienceOptimal transport distances have found many applications in machine learning for their capacity to compare non-parametric probability distributions. Yet their algorithmic complexity generally prevents their direct use on large scale datasets. Among the possible strategies to alleviate this issue, practitioners can rely on computing estimates of these distances over subsets of data, {\em i.e.} minibatches. While computationally appealing, we highlight in this paper some limits of this strategy, arguing it can lead to undesirable smoothing effects. As an alternative, we suggest that the same minibatch strategy coupled with unbalanced optimal transport can yield more robust behavior. We discuss the associated theoretical properties, such as unbiased estimators, existence of gradients and concentration bounds. Our experimental study shows that in challenging problems associated to domain adaptation, the use of unbalanced optimal transport leads to significantly better results, competing with or surpassing recent baselines
Unbalanced minibatch Optimal Transport; applications to Domain Adaptation
International audienceOptimal transport distances have found many applications in machine learning for their capacity to compare non-parametric probability distributions. Yet their algorithmic complexity generally prevents their direct use on large scale datasets. Among the possible strategies to alleviate this issue, practitioners can rely on computing estimates of these distances over subsets of data, {\em i.e.} minibatches. While computationally appealing, we highlight in this paper some limits of this strategy, arguing it can lead to undesirable smoothing effects. As an alternative, we suggest that the same minibatch strategy coupled with unbalanced optimal transport can yield more robust behavior. We discuss the associated theoretical properties, such as unbiased estimators, existence of gradients and concentration bounds. Our experimental study shows that in challenging problems associated to domain adaptation, the use of unbalanced optimal transport leads to significantly better results, competing with or surpassing recent baselines
Interpolating between Optimal Transport and MMD using Sinkhorn Divergences
Comparing probability distributions is a fundamental problem in data sciences. Simple norms and divergences such as the total variation and the relative entropy only compare densities in a point-wise manner and fail to capture the geometric nature of the problem. In sharp contrast, Maximum Mean Discrepancies (MMD) and Optimal Transport distances (OT) are two classes of distances between measures that take into account the geometry of the underlying space and metrize the convergence in law. This paper studies the Sinkhorn divergences, a family of geometric divergences that interpolates between MMD and OT. Relying on a new notion of geometric entropy, we provide theoretical guarantees for these divergences: positivity, convexity and metrization of the convergence in law. On the practical side, we detail a numerical scheme that enables the large scale application of these divergences for machine learning: on the GPU, gradients of the Sinkhorn loss can be computed for batches of a million samples